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Abstract. In this paper we report our work on a new constraint domain, where 
variables can take structured values. Earth- science data processing (ESDP) is a 
planning domain that requires the ability to represent and reason about complex 
constraints over structured data, such as satellite images. This paper reports on a 
constraint-based planner for ESDP and similar domains. We discuss our approach 
for translating a planning problem into a constraint satisfaction problem (CSP) 
and for representing and reasoning about structured objects and constraints over 
structures. 

1 liitruuucuuu 

Earth-science data processing (ESDP) at NASA is the problem of transforming low- 
level observations of the Earth system, such as data from Earth-observing satellites 
and ground weather stations, into high-level observations or predictions, such as crop 
failure or high fire risk. Given the large number of socially and economically important 
variables that can be derived from the data, the complexity of the data processing needed 
to derive them and the many terabytes of data that must be processed each day, there are 
great challenges and opportunities in processing the data in a timely manner, and a need 
for more effective automation. Our approach to providing this automation is to cast it 
as a planing problem: we represent data- processing operations as planner actions and 
desired data products as planner goals, and use a planner to generate data-flow programs 
that produce the requested data. 

Many of the recent advances in planning, such as state-based heuristic search or 
reduction to satisfiability problems, are not readily adapted to ESDP problems, due to 
the following features: 

- universal quantification: Many commands and programs operate on sets of things, 
where membership in the set can be defined in terms of necessary and sufficient 
conditions. For example, the Unix Is command lists all files in a given directory. 

- incomplete information: It is common for a planner to have only incomplete in- 
formation at the time of planning. For example, a planner is unlikely to know about 
all the files on the local file system, until Is command is executed. 

- large, dynamic universe: The size of the universe is generally very large or infinite. 
For example, there are hundreds of thousands of files accessible on a typical file 
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system and billions of web pages over the Internet. The number of possible files, 
file pathnames, etc, is effectively infinite. Furthermore, new files can be created by 
executing actions, so the universe is dynamic. 

- complex data types: Files and other objects in the domain are complex data struc- 
tures, specified in terms of their attributes, which can, in turn, be complex data 
types. For example, a satellite image is specified by resolution, date, region, etc, 
the region can be specified by a pair of points defining its bounding box, and a 
point is a pair of coordinates designating the longitude and latitude. 

- complex constraints: Data processing domain typically involves a rich set of con- 
straints. For example, specifications of data inputs and outputs include constraints 
indicating geographic regions of interest, thresholds on resolution, data quality, file 
size, etc. Specifications of data-processing operations include constraints relating 
the inputs of the operations to the outputs, which are complex objects such as satel- 
lite images and weather forecast data. In the course of planning, additional con- 
straints arise specifying how parameters of an action depend on the parameters of 
other actions in the plan. 

We take the approach, like many other researchers [21,16,7,20], of translating the plan- 
ning problem into a constraint satisfaction problem (CSP). However, since data pro- 
cessing domains are substantially different from other planning domains that have been 
explored, our approach to translating planning problems to CSPs differs as well. For ex- 
ample, [7] use variables to represent goals and domains to represent available planner 
actions achieving the goals. Constraints are used to encode mutual exclusion relations. 
While this is an effective approach for propositional planning problems, we also need 
variables to represent objects and action parameters, and constraints to represent rela- 
tions among them. Thus, our encoding is somewhat more complex. 

For example, actions can have inputs and outputs, which are represented as vari- 
ables, typically of some complex (structured) type. Attributes of these inputs and out- 
puts may also be referenced by variables, and constraints over any of these variables 
may be specified as part of the action description. For example, an action that produces 
a scaled-down copy of an image might have a constraint specifying that the resolution 
of the output image equals the resolution of the input times a scale factor (which is a 
parameter of the action). Other attributes of the input image, such as the subject matter, 
will be unchanged in the output. 

In this paper, we report our recent work on representing and reasoning about struc- 
tured objects, which we refer to as structures. Section 2 discusses data processing as 
a planning domain and our planning approach, focusing on how we handle structures. 
Section 3 discusses how we represent constraints over structures, and Section 4 de- 
scribes our approach for solving constraint problems that include structures. 

2 Planning in the data processing domain 

2.1 TOPS - a data processing domain 

The Terrestrial Observation and Prediction System (TOPS, http://www.forestry.umt- 
edu/ntsg/Projects/TOPS/)[18] is an ecological forecasting system that assimilates data 



Fig. 1 . Structured inputs and outputs to a model in the TOPS domain. Solid arrows represent data 
flow. Broken arrows represent sub-structure relations. 


from Earth-orbiting satellites and ground weather stations to model and forecast con- 
ditions on the surface, such as soil moisture, vegetation growth and plant stress. The 
goal of this system is to monitor and predict changes in key environmental variables. 
The inputs needed by a TOPS model run include satellite data, such as Fractional Pho- 
tosynthetically Active Radiation (FPAR), or Leaf Area Index (LAI) and weather data, 
such as precipitation. The input data may be obtained directly from data archives some- 
where, but most of times, the data obtained from data sources need to be processed 
before the model run. A common sequence of data processing is: 1) gather data from 
multiple sources; 2) convert the data into a common representation, combine data, and 
perform other transformations; 3) feed the data into a model, for example, a simulation 
of gross primary production (GPP) of terrestrial vegetation; 4) convert the output of the 
model into some form suitable for visualization. Routine data processing can consume 
roughly 80% of manpower, with only 20% devoted to data analysis . As a result, the 
vast majority of data are never used, in part due to the effort required to prepare data 
[17]. We have developed a planner-based agent, called IMAGEbot, to automate these 
data-processing activities. 

The data consumed and produced in the TOPS domain are ail complex data struc- 
tures, such as spatial data. Figure 1 shows a simplified view of the data input and output 
to a TOPS ecological model. 

23 Planning approach 

The architecture of IMAGEbot is described in Figure 2. Planning domains are speci- 
fied in an expressive language called the Data Processing Action Description Language 
(DPADL) [9]. From the planning problems specified in DPADL, the planner incremen- 


tally constructs a lifted planning graph , from which it extracts distance estimates for 
heuristic search and also derives a CSP representation of the planning problem. Whereas 
a conventional planning graph [3] is a grounded representation, consisting of ground ac- 
tions and propositions, a lifted planning graph contains variables. This is a much more 
concise representation than an ordinary planning graph, but it is potentially less infor- 
mative. We use a constraint propagation algorithm to restrict the domains of variables in 
the graph, making it more informative. The focus of this paper is on the CSP represen- 
tation. The planning search and constraint propagation are conducted iteratively, until 
a plan is found or the planner proves that no valid plan exists in the planning graph, in 
which case, it either extends the planning graph, or admits failure. 



Fig. 2. The IMAGEbot architecture 


Actions and conditions An action is a tuple (7, 0,5P, n,£,x), where 1,0 fF are the 
input variables, output variables and parameters , respectively. All these variables are 
typed, n is the precondition, £ is a list of effects and % is a procedure for executing the 
action that may reference any variable in I U fP and must set every variable in O. 

A full discussion of preconditions and effects in DPADL can be found in [9]; for 
the purposes of this paper, it suffices to observe that many goals and preconditions 
consist of requirements on the attributes of variables in I and many effect conditions 
consist of assignments to the attributes of variables in O and creation of new objects 
(which themselves are specified in terms of assignments on attributes). These condi- 
tions can be expressed in the concise canonical form v = {a^,a 2 ,. ,a n ), where v is 
a variable and {ai,02?* *• > a n) is a structure specification, where each attribute a t may 
be a variable, constant, structure specification, or 0 (unspecified). For example, sup- 
pose the attributes of a file are name, format, and size. A file can be represented as a 
tuple (name, format, size). To specify the goal of finding a file / named “foo.txt” whose 
size is greater than 100, we could write / = ("foo.txt", 0,s) A 5 > 100. 

Like goals, effects can also have 0-attributes, but the meaning is different. In goals, 
0 means don ’ t care. In effects, it means default . Any variable o 6 O or new object may 
be specified as a copy of some variable d € 7, in which case attributes of o default to 
the same value as attributes of d. If nothing else were specified, then o would be a per- 
fect copy of d. However, what we are interested in is typically not perfect copies, but 







Algorithm 1 Structure Unification. £ and Q are structure specifications from an effect 
and goal, respectively. £ is the initial variable bindings and k is a variable designating 
the object providing “default” values for £. 
unify(£ =< e u . . . ,e k >, Q =< g u - . . ,gk >, k) 

1. let 2>:=0 

2. unless(type(£) super-type of type(^)) 

(a) return ± 

3. for(i — 1 . . . fc) 

(a) if fe^0) 

i. if(g/ is a structure and <?, is a variable)) 

£ := unify(new Structure^,), g i7 

ii. else if(e/ is a structure and gi is a variable) 

- let 9^ := new Structure^,) 

iii. else (B := unify (e/,g,-,£) 

(b) else 

i. if(g/ — 0) g t :~ new variable 

ii. #{%) — &) 

- new Structured k) ( £> =< d\ = 0. . . . .d, = 0 di—0 >) 

- ® := ^U{k^ £>} 
iii i£(d f =0)d, :-g f 

iv. else $ := ® U {di « g ( } 

(c) else return 1 


imperfect ones. For example, there are many actions that change just one or two prop- 
erties of an object, such as file format, projection, resolution, size, or name. Specifying 
the outputs of those actions as copies of their inputs allows us to list only the attributes 
that are changed [12]. In our canonical form, an effect that changed only one attribute 
of o would be of the form o = (0, 0, . . . , n, . . . 0, 0), where n is the new value for the 
attribute that changed. All other attributes take on the corresponding value from d. 

Structure unification Given a goal and effect in canonical form, determining whether 
(and under what conditions) the effect satisfies the goal is a simple matter of unification. 
Pure constraints, such as s > 100 in the example above, can never appear in effects, so 
they are just added to tire CSP. Subgoals of the form v = (a\,ci 2 , - - ■ ,<z n ) are matched 
against effects of the same form, using Algorithm 1. For the most part, this unification is 
just term-by-term matching, but 0 entries in effects, when matched with non-0 entries 
in goals, result in delegation to whatever input variable to output is a copy of. This 
delegation results in an equality constraint between the the input variable and a new 
structure specification, which is treated as a new subgoal (Figure 3). 

Lifted planning graphs The planner incrementally constructs a directed graph, similar 
to a planning graph [3], but using a lifted representation containing variables). This 



graph is used to obtain distance estimates for heuristic search, and is also the basis for 
the construction of the CSP. Arcs in the graph are analogous two causal links [19]. 
A causal link is triple (oL s ,p,a p ), recording the decision to use action a s to support 
precondition p of action a p . However, instead of an arc to record a commitment of 
support, we use it to indicate the possibility that a s supports p. The lifted graph contains 
multiple ways of supporting p\ the choice of the actual supporter becomes a constraint 
satisfaction problem. We add an extra term to the arc for bookkeeping purposes - the 
condition y^ 5 needed in order for a s to achieve p. A link then becomes {oL s ,i£ s ,p,a p ). 

Given an unsupported precondition p of action a p , our first task is to identify all the 
actions that could support p. Because the universe is large and dynamic, identifying all 
possible ground actions that could support p would be impractical, so instead we use a 
lifted representation, identifying all action schemas that could provide support. Given 
an action schema a, we determine whether it supports p by regressing p through a s 
(Figure 3). The result of regression is a formula y£ 5 . If y£* =_L, then a s does not support 
p. The procedure for goal regression is straightforward, relying mainly on a unification- 
based entailment test , as described in Algorithm 1. Initial graph construction terminates 
when all preconditions have support or (more likely) a potential loop is detected. 



Fig. 3. Goal regression on structures. Both outputs (1) and inputs (2) of actions can be described 
as partial structure specifications (shown as double-bordered boxes). When matching an input 
(ini) to an output (outl) during planning, these structures are unified (c), resulting in equality 
constraints among attributes (labels on arc). Since the output outl is defined as a copy of in2, 
goal conditions on attributes of ini that are undefined for outl are delegated to in2, resulting in a 
new subgoal. 


From planning to constraints After the planning graph is constructed, a constraint 
satisfaction problem (CSP) representing the search space is incrementally built The 
CSP contains: 1) boolean variables for all arcs, nodes and conditions; 2) variables for 
all parameters, input and output variables and function values; 3) for every condition in 
the graph, a constraint specifying when that condition holds (for conditions supported 
by links, this is just the XOR of the arc variables) ; 4) for conjunctive and disjunctive 
expressions, the constraint is the respective conjunction or disjunction of the boolean 
variables corresponding to appropriate subexpressions; 5) for every arc in the graph, 
constraints specifying the conditions under which die supported fluents will be achieved 
(i.e., => p); 6) user-specified constraints; and 7) constraints representing structured 

objects. 

3 Constraints over structures 

We start by reviewing some standard CSP notation and then describe how we represent 
and reason about structures. 

3.1 Constraint satisfaction problems 

A Constraint Satisfaction Problem (CSP) consists of variables, domains, and constraints. 
Formally, it can be defined as a triple <X,D,C> where X = {x\,X 2 , - . - ,*„} is a finite 
set of variables, D = {d(xi),d(x 2 ),. . . ,d(x n )} is a set of domains containing values die 
variables may take, and C — {Cj , C 2 , - - - , C m ] is a set of constraints. Each constraint Q 
is defined as a relation R on a subset of variables V = {x^xj, . - . ,x*}, called the con- 
straint scope. R may be represented extensionally as a subset of the Cartesian product 
d(xi) x d(xj) x ... x dfa), or implicidy using a constraint procedure [13]. A constraint 
Q = (V/,/?;) limits the values the variables in V can take simultaneously to those assign- 
ments that satisfy R. A solution to the problem is an assignment of values to variables in 
X satisfying constraints in C. The central reasoning task (or the task of solving a CSP) 
is to find one or more solutions. 

In addition, the CSP converted from the planning problem, as discussed in Section 

2.2 has the following features: 

- Dynamics: variables and constraints may be dynamically added or removed from 
the problem, and values may be dynamically added or removed from domains. 

- Infin ity: variables may have unknown or infinite domains, making it impossible to 
represent constraints extensionally as relation tables. 

- Typed variables: variables may take different types of values, such as numbers, 
booleans, strings and structured objects. 

- Incompleteness: a solution to a problem doesn’t require a complete assignment to 
all the variables in the problem, in part due to dynamics. 

It has been reported in [8,10,1 1] how dynamics, infinity, string variables and constraints 
are handled. In this paper, we focus on how to represent and reason about structured 
objects, or structures for short. We call variables that take structures as values structure 
variables. 



3.2 Structures 


Data objects often contain complex data structures 1 . Being able to efficiently represent 
and reason about these structures, as well as actions that create, copy or modify them, 
is essential. As discussed in Section 2,2, a structure can be specified in the concise 
canonical form * • * i a n)> where a, is attribute, which could be another structure 

specification. In the CSP, it also seems natural to represent a structured object as a 
tuple. For example, suppose a map has 3 attributes, resolution , region, and date. The 
tuple < 8, USA, 2003154 > represents a map of the US taken on the day 154 of year 
2003 with resolution of 8 kilometers (one pixel corresponds to a 8km square). If a single 
object can be represented as a tuple, a set of objects of a given type can be represented 
as a relation table, where each row of the table is an object. If the structure attributes are 
represented as constraint variables (they are variables in the DPADL representation), a 
structure can be represented as a constraint, which is a relation by definition. Thus, it 
would seem unnecessary to have variables that take structures as values. 

Unfortunately, this representation is limited to the situation where all the structure 
instances of a given type are known, there are only a finite number of them, and there 
are no constraints on structures but only on structure attributes. If the structure instances 
are unknown or there are infinitely many of them, which is more likely in real-world 
applications, there will be no finite representation of the constraints; if there are con- 
straints on structures, we will end up with a CSP having 2 nc *-order constraints; that is, 
constraints over constraints. 

We take the approach of representing structures as constraints, but in a different 
form. We allow the values of variables to be structures. As mentioned before, these 
variables are called structure variables. Like other variables, structure variables can ap- 
pear in any form of constraints, and can be treated just as regular variables. Attributes 
of structures are also variables (which will be addressed as attribute variables if neces- 
sary). However, a structure variable also appears in a special form of constraint, called 
structure constraint. For each structure variable, we create a structure constraint on 
this structure variable and its attribute variables. For example, if we have a structure 
variable, x, that takes maps as values, we will have a structure constraint on <x, x.res , 
x.region , x.time>, where x.res, x. region, and x.time are attribute variables of x. 

Although a structure constraint on a structure variable represents a set of structured 
objects, the representation is often implicit; it relates a structure variable to the structure 
attributes, capturing certain conditions relevant to structures in the DPADL coding of 
the planning problem. Adding structure variables into the CSP coding is not redundant 
but necessary to represent constraints over the structures. Now the question becomes 
how to specify structure constraints. 

A structure constraint, like other constraints in our constraint library, is implemented 
as a procedure, or actually a collection of procedures over subsets of the structure and 
attribute variables. The applicability of these procedures depends on the domains of 
the structure and attributes variables, especially on whether they are finite or infinite. 

1 There are recursive structured objects such as filesystem directories and non-structured or 
semi- structured data objects such as the content of an image or text file, which are discussed 
in [1 1]. Here we limit the discussion to non-recursive structures. 



Algorithm 3 Structure Constraint Procedure 

executed, {a\,a 2 ,.. >,a k ) ,M x ,M a ,A) 

1. If d(x) is finite 

(a) for each v e d(x) 

i. if M x is not empty, execute all procedures in M x on v; 

ii. else execute default methods on v to get attribute values; 

(b) restrict d(a \), ..., d(a k ) with the new set of values; 

(c) if any d{a{) becomes empty, return failure; 

2. if M a is not empty, execute all procedures in M A on a\,...,a k ; 

3. else if d(a\ ), d(a k ) are finite 

(a) for each tuple < u\,...,u k >€ d{a\) x ... xd(a k ) 

l execute the default method on < u \ , ...,u k > to get a x value; 

(b) restrict d (x) by the new set of values; 

(c) if d(x) becomes empty, return failure; 

4. add variables whose domains are changed A ; 

5. return success; 


For example, given the variable representing the structure < 8, USA, 0 >, le., all 8-km 
maps of the US, we can trivially determine the domains for the resolution and region 
attributes, but we have no way of determining the domain of the date attribute. But 
given a variable representing < 0, USA, 2003 154 >, we can determine (via a procedure 
call that translates into a database query) , what maps of the US are available for that 
date. The API call returns a set of the actual data structures (in the case of TOPS, Java 
objects) representing each map. This set comprises a finite domain representation for the 
structure variable. We can then determine the resolution for each map by executing the 
appropriate procedure on the corresponding data structure, making that domain finite as 
well. A more formal description of structure constraint execution is given in Algorithm 
3. 

A constraint gets executed by the propagator when any variables in the constraint 
scope are added to the agenda, which usually means their domains are changed. Exe- 
cuting a structure constraint not only enforces the compatibility between the structure 
and its attributes; it also initializes the domains of variables, which may have unknown 
domains at the beginning, and eliminates inconsistent values from the domains. 


33 Comparison to the hidden variable representation 

The hidden variable representation is a binary representation of non-binary constraint 
problems [6,1]. A k-ary constraint represented as a relation can be translated into a set 
of k binary constraints by adding a hidden variable, with a domain containing the tuples 
in the original constraint A hidden variable represents the original constraint and each 
binary constraint (also called a hidden constraint) between the hidden variable and each 
regular variable is one-way functional constraint in which each value (a tuple in the 


relation) of the hidden variable is compatible with at most one value in the domain of 
the regular variable. 

If an application problem involving structures doesn’t have constraints over struc- 
tures but does have constraints over structure attributes, the problem can be formalized 
as a non-binary CSP where attributes are variables and structures are constraints. Such 
a non-binary CSP can be transformed as a binary one via hidden variable transforma- 
tion, where hidden variables are actually structures. For example, in a crossword puzzle 
problem, which is to find words from a given set of words that fit into the “word slots” 
on a n x m puzzle board with blanks, words may be viewed as structures and letters as 
attributes. There are only constraints on letters (i.e., on word slot intersections). This 
problem can be formalized as a non-binary CSP, where variables are the spaces on the 
puzzle board (excluding blanks) and constraints are the word slots, which can be repre- 
sented using relations (each tuple is a word of the same length in the given dictionary). 
This CSP can be transformed into a binary CSP, where hidden variables are word slots. 

If the problem has constraints over structures, representing structures as constraints 
may lead to 2 nd -order constraints, which should be avoided. The structure represen- 
tation we propose here indeed results in a non-binary CSP, but it is not suitable for 
hidden variable transformation, because structure constraints are usually not specified 
by relation tables. 

However, we can have a binary structure representation: instead of a non-binary 
constraints on the structure variable and its attribute variable, we can have a set of binary 
constraints, each is posed on the structure variable and one of its attribute variables. 
This representation may be useful in some limited situation; for example, if a binary 
CSP solver is preferred. 

Our reason for using Jc-ary structure constraints instead of a binary representation 
is that the latter is inadequate to represent the ESDP domain. In practice, the structures 
we need to represent correspond to real data structures, in our case, Java objects in 
the TOPS system, and the constraint procedures are function calls on those data struc- 
tures made available by TOPS. While some of those function calls are a good fit for 
the hidden variable representation, namely, “getter methods,” Le ., functions that take a 
structure as an argument and return one of its attributes, other function calls take multi- 
ple arguments and cannot be represented using binary constraints. By defining a general 
structure constraint in the constraint library, we are able to integrate the constraint rea- 
soning system with the runtime software environment so that the operations provided 
by the environment can be called when the constraint is executed during constraint 
propagation. The constraint system is able to query the environment, to dynamically 
determine what objects exist and what attributes or properties those objects have. 

3.4 Example - the model ingest problem 

A core problem of Earth science data processing is “model ingest”, i.e., preparing 
a number of data files representing Earth system variables (we will use the word s- 
variable to avoid confusion with constraint variables) and feeding them into a model. 
Typically, the model inputs are spatial data, such as satellite images, and other struc- 
tured data types, as shown in Figure 1. 


Most approaches to model ingest specialize on particular data sources, hard-coding 
choices such as resolution and projection. This is not required by the models them- 
selves, which typically compute their results on a pixel-by-pixel basis, where the indi- 
vidual pixels can be regarded as point data. What is required by the models is that the 
inputs are consistent, with the same resolution, projection, etc., so that corresponding 
pixels all refer to the same location and time. Committing to particular data sources 
ensures that the inputs are mutually consistent, but at the cost of flexibility; changing, 
say, from 8km data to 1km data requires a major recoding effort 

Since the reason for using a planner is to have a system that can use the best data 
for a given purpose, subject to availability and resource constraints, such as storage 
and bandwidth, we adopt the opposite approach, making no commitment to a particular 
data source, resolution or projection, but relying on constraints to ensure that they are 
compatible. In addition to selecting input files that are compatible, we also have the 
option of making them compatible, for example by reprojection, which is represented 
as a planner action. 

We consider a simplified version of this problem, where we are given a set of 20 
georeferenced spatial data files, and the goal is to find (or produce) 4 data files to feed 
into the model. For simplicity, we assume that these files are specified by a flat list of 10 
attributes such as resolution , projection , etc, and all of them are primitive types. These 

a jet x : j :*.: 

uccu tu JXUA 31 / tut lixuvwijug, couuiuuua. 

- We need one file each for the s-variables FPAR, LAI, PRECIP and TEMP. 

- the 4 files have to be the same resolution , projection , and format. 

- the 4 files have to cover the same geographic region. 

- the date , cloudiness , and quality of these 4 image files have to be in the specified 
range. In particular, FPAR and LAI don’t change very rapidly, so any file within 
a week of the target date is acceptable, but cloudiness has to be low. Conversely, 
PRECIP and TEMP must be for the exact date of interest, but clouds are irrelevant 

There are two actions available: reproject which changes the projection of the input file, 
and scale which decreases the resolution. 

This simplified problem is trivial for the IMAGEbot Planner, but for the purpose 
of being able to ignore other variables and constraints generated by the planner, let’s 
assume that the planner constructs a grounded planning graph (that is, every action 
node is grounded) as in Figure 4. Note that the diagram only shows actions for one 
goal; others are the same due to symmetry of the 4 subgoals. 



Fig. 4. A simplified planning graph 



Ignoring the actions, the problem in hand is a typical constraint problem, where 4 
files need to be found from the given 20 input files, satisfying those conditions. Because 
there are no constraints between files, only between attributes, we don’t even need struc- 
ture variables and structure constraints. The minimal CSP would have 4x10 variables, 
each for an attribute of the file, and the user-specified constraints listed above, such an 
AllSame constraint on the resolution attribute variables. The input file objects can be 
represented by 10-ary constraint containing 20 tuples. 

With actions involved, we still need to find the 4 files, but the files can be taken 
directly from the input file set or can be produced by the chosen actions. Therefore, 
we have constraints specifying the relation of inputs and outputs of planner actions, 
and specifying the casual relation of subgoals and conditions, all involving file objects. 
For the graph in Figure 4, we have 16 actions; for each action, we have the following 
variables and constraints: 

- a structure variable for its input image, and 10 variables for the input’s attributes, 
and a boolean variable b representing whether the action is in the plan. 

- a structure constraint for the structure variable and its attributes, and 10 conditional 
equality constraints, each with the conditional variable b and a pair of attributes, 
one from the input image and the other from the output of the action supporting it. 
For example, suppose the variable scale is true iff the goal file g 4 is produced by the 
scale action and the input of action scale is scaleln. Then we have the constraint 
scale => .5 * scaleln.res — g^.res, meaning that if the scale action is chosen to 
produce g 4 , then the resolution of g 4 will be half the resolution of scaleln. We also 
have the constraint scale => scaleln. format = g 4 . for mat, meaning that the format 
of g 4 will be the same as that of scaleln. 

- other boolean variables for logical expressions that represent planner subgoals and 
planner action conditions, which we will ignore them for now. 


4 Solving the constraint problem 

4.1 Reasoning about structures 

Structure unification, as discussed in Section 2.2, and as illustrated above, creates a 
fairly large number of equality constraints over structures and structure attributes, which 
provides an opportunity for symbolically reasoning about structures. 

When the planning graph is built, we create an additional symbolic constraint net- 
work containing only equality constraints, mainly on structures (note that a regular 
variable can be considered a structure of a primitive type). This network of equality 
constraints is propagated before creating the regular CSP. The propagation aims mainly 
at eliminating unnecessary variables and constraints. It also detects inconsistencies. The 
propagation and unification algorithms are given in Algorithm 4 and 5. 

4.2 Propagation and search 

Constraint propagation is an essential part of solving the constraint problem generated 
from a planning problem, not only because it eliminates inconsistent values, but also 


Algorithm 4 Propagation by unification 

Let fAf be a network of equality constraints, A a set of structures called agenda , and let L and R 

be two structures 

propagate{A) 

1. while A is not empty, do 

(a) C<- set of constraints containing structures in A ; 

(b) for each L=R € C 

i. remove L = R from C; 

i i. if unify(L,/LA, returns failure, return failure; 

2. return success ; 



Algorithm 5 Unification 

Let be a network of equality constraints, A a set of structures called agenda , and let L and R 
be two structures 


unify(L,/c,A,iAD 

1. if L = R , remove L — R from C\C, return success ; 

2. if L and /? are not the same type, return failure: 

3. if L and R are constants and L ^ R return failure-, 

4. if R is a constant 

(a) if L is a free variable, substitute R for L in iAfc remove L — R 

(b) add all structures changed by substitution to A ; 

5. if L is a constant 

(a) if R is a free variable, substitute L for R in 9\C; remove L = R 

(b) add all structures changed by substitution to A; 

6. for each pair of attributes La and R ai add L a = ^to 

7. return success; 




because the constraint problem in hand contains universally quantified variables with 
infinite domains, which cannot be enumerated by search. The propagation on regu- 
lar variables (including structure variables, which are treated as regular variables) is 
straightforward: whenever the domain of a variable is changed, the constraints contain- 
ing this variable are executed. If executing the constraint results in any variable domain 
changes, the constraints containing the changed variables will be executed. This pro- 
cess continues until there are no further changes, or a constraint execution fails; that is, 
the execution results in empty variable domains. Constraint execution failure implies 
that the constraint network is not consistent. The propagation enforces generalized arc- 
consistency [2,14], 

We have implemented a heuristic planning search algorithm and a few constraint 
search algorithms, such as backtracking, backjumping and conflict-directed backjump- 
ing, all of which interleave search with propagation; that is, when the search algo- 
rithm assigns a value to a variable, the changes are propagated. The high-level plan- 
ning search, guided by heuristic distance estimates extracted from the planning graph 
[3], selects planner subgoals to achieve, and planner actions to achieve the subgoals. 
The constraint search finds values for variables representing planner action parameters. 
This is necessary to make actions executable. During the search, propagation is per- 
formed whenever a value is assigned to a variable. The search is an iterative process 
involving possible backtracks; that is, if there are no valid parameters for a chosen ac- 
tion, the planner has to search for another plan; if it is impossible to extract a plan from 
the current planning graph, the planning graph has to be extended. 


5 Conclusions 

We have discussed a novel approach to constraint-based planning in domains that in- 
volve structured objects. The planner is implemented and used in the IMAGEbot planner- 
based agent, which has been applied to the TOPS ecological forecasting application. Al- 
though our focus in this paper was the Earth-science data processing (ESDP) domain, 
the planner is domain-independent, and the approach presented is broadly applicable. 
Structured objects, in the form of complex data structures, are ubiquitous in software 
domains, and may be useful in areas such as custom hardware configuration and manu- 
facturing. 

Contributions of the paper include the introduction of structure variables and struc- 
ture constraints over those variables and a symbolic propagation algorithm that simpli- 
fies the constraint network. 

There has been relatively little work relating to structured objects in either planning 
or constraint reasoning. One notable exception is the COLLAGE planner [15], which 
dealt with structured object in the KHOROS image processing domain and was also 
based on constraint reasoning. Another constraint-based planner for a similar domain 
is the Multi-mission Vicar Planner (MVP) [5). Both of these are action- decomposition 
planners, which have less of a need to represent and reason about data transformations 
in the way that IMAGEbot does, and neither supports the kind a structure constraints 
discussed in this paper. [4] addresses workflow planning for computation grids, a sim- 


ilar problem to ours, though their focus is on mapping pre-specified workflows onto a 
specific grid environment, whereas our focus is on generating the workflows. 
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